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00 ! Abstract. Currently two query languages are denned as standards for 

I/"") ■ the Virtual Observatory (VO). Astronomical Data Query Language (ADQL) 

is used for catalog data query and Simple Image Access Protocol (SIAP) 
is for image data query. As a result, when we query each data service, 
, we need to know in advance which language is supported and then con- 

' struct a query language accordingly. The construct of SIAP is simple, but 

they have a limited capability. For example, there is no way to specify 
multiple regions in one query, and it is difficult to specify complex query 
conditions. In this paper, we propose a unified query language for any 
^ | kind of astronomical database on the basis of SQL99. SQL is a query 

Ci ■ language optimized for a table data, so to apply the SQL to the image 

and spectrum data set, the data structure need to be mapped to a table 
like structure. We present specification of this query language and an 
rS ' example of the architecture for the database system. 
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1. Introduction 

At present, two kinds of data query language are defined as standards in the 
Virtual Observatory. One is the parameter query, which is used for image data 
search and called as "Simple Image Access Protocol" or SIAP. Search criteria 
are specified by a set of "key" and "value" pairs. Another one is a structured 
query language, which is used for catalog data query and called as "Astronomical 
Data Query Language" or ADQL (Yasuda et al. 2004). ADQL can specify more 
complex search criteria than the SIAP does, also has an ability to join multiple 
tables, and can select values derived from the DB columns. So it is worthwhile to 
adapt the ADQL to perform an image query. The high flexibility of the ADQL 
syntax, however, raise difficulties to develop an ADQL compliant data service. 
It is known that incompatibility among various DBMS products is due to the 
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complexity of the SQL specification. The success of the Virtual Observatory 
project depends on uniformity of interfaces of all the astronomical data services, 
so we need to avoid the complexity and make it simple and easy to implement. 

2. Syntax Specification 

In order to realize the interoperability among the distributed data archives, we 
need to define a standard query language. As the standard query language must 
be supported by all the data services, it's specification should be simple and 
clear for easy implementation. On the other hand, some data service require 
more sophisticated query syntax to allow users to specify more efficient search 
criteria. Thus we defined basic syntax which must be supported by all the data 
services and allowed several enhancements on the basic syntax. 

2.1. Basic syntax 

The following restrictions are applied to the "select" statement of SQL99 speci- 
fication. 

1. Only a column name or "*" is specified in the selection list, so an algebraic 
expression and a function are not allowed. So the role of the "select" 
clause is just to specify the columns you want to get. The calculation of 
the derivative from those columns should be done on a portal server or by 
users themselves. 

2. Only one table is specified in the "from" clause, so any type of table join 
syntax is not allowed. The join of the tables should be done on a portal 
server or by users themselves. 

3. Allowed comparison operators and predicates are =, <, >, >=, <=, <>, 
within, contains, overlaps, LIKE and BETWEEN, and any other opera- 
tors are not allowed. The operators within, contains and overlaps are 
not specified in SQL99, but are introduced to specify a region on the sky 
as such search criterion is a fundamental one for the astronomical data 
service. These three operators compare two values of geometry data type, 
which is described in the next subsection. 

4. Functions point () , circle () and box() are introduced to express a region 
on the sky, and distance () is also introduced to describe the proximity 
of the two coordinates. 

5. Allowed logical operators are AND and NOT. OR is not allowed as the mixture 
of AND and OR makes the search condition too complex. 

6. Table and column may have an alias name. 

2.2. Geometry Data Type 

The geometry data type is introduced to represent a point or a region in the 
sky. A point is always expressed by two coordinate values, and in most of the 
cases a single coordinate value is meaningless and only the pair have a phys- 
ical meaning. So it is natural to have a column which has a pair of values 
to express the coordinate of celestial object, then it becomes simple to de- 
scribe the search criterion. For an example, if the coordinate values are pre- 
pared in two separate columns, the search condition is expressed as follows: 
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point (tl .ra, tl.dec) within box((t2.ra, t2.dec), 1.0, 1.0). On the 

other hand, if the coordinate values are prepared in one single column, the same 
statement can be expressed shortly as follows: tl. point within t2. region. 
We defined "Point", "Circle" and "Box" data types. Expressions of these data 
types in SQL are summarized in Table 1. 



Table 1. Expressions of the geometry data type value 



Data Type 




Examples of Expression 




Point 
Box 
Circle 


Point (23. , 
Box((23.0, 
Circle ((23 


+10., 'FK5'), Point (23. , +10.) 
+10.0) , 1, 1), ((23.0, +10.0) , 
.0, +10.0) , 1), ((23.0, +10.0) , 


, (23., +10.) 
1, 1) 
1) 



2.3. Example SQLs for Basic Syntax 

The next SQL shows an example of catalog data query. 

Select g.ra, g.dec, g.mag_r 
From galaxy as g 

Where Point (g.ra, g.dec) within Circle ((24. 3, +5.0), 1.0) 
and g.mag_r < 24 

This query describes "from a table named galaxy select right ascension, decli- 
nation, and R-band magnitude of celestial objects located in the circle whose 
center coordinate is (ra, dec) = (24.3 deg, +5.0 deg) and radius is 1.0 deg". In 
the "galaxy" table, coordinates of the objects are prepared in the two separate 
columns, "ra" and "dec", so Point() function is required to describe the search 
region. In the "galaxy2" table of the next SQL sample, the coordinates are 
prepared in a single column "point". 

Select g. point, g.mag_r 
From galaxy2 as g 

Where g. point within Circle ((24. 3, +5.0), 1.0) and g.mag_r < 24 

The next SQL shows an example of an image data query. 

Select img. filter, img.imageURL 
From imageData as img 

Where img. region = BOX ((24. 3, +5.0), 0.1, 0.1) 

This equivalents to the next URL-based SIAP. 

http : // j vo . nao . ac . jp/image?P0S=24 .3,5. 0&SIZE=0 . 1&F0RMAT=V0TABLE 

In the above SQL, the column region is used as a parameter to specify the 
region of interest, and the access URL of the cut-out image is returned as a data 
of imageURL column. Those columns are actually not present in the relational 
table on the DBMS, but user can specify as if it virtually exists. So these 
columns are called as virtual columns and its table is called as a virtual table or 
a view in the RDB terminology. 
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7V0 SkyNode DBMS 




Figure 1. Architecture of JVO Skynode designed to be compatible 
with the proposed VOQL. 



2.4. Syntax Enhancement 

The following enhancements may be applied as optional features. 

1. Multiple tables enhancement: Multiple tables can be queried with a single 
SQL. External tables provided in a form of VOTable must be treated in 
the same manner as the original tables of the data service. The external 
table is specified as EXT: : <f ileNumber> . <resourceName> . <tableName>. 
Join predicate and sub-query are not mandatory. 

2. Unit support enhancement: Any numeric value may be followed by a unit. 
Unit conversion must be carried out by the data service. 

3. Algebraic expression enhancement: An algebraic expression can be speci- 
fied in the selection list and search criteria in the "where" clause. 

4. Logical operator OR enhancement: Support for mixture of AND, NOT and OR 
in the "where" clause. 

5. Object data type enhancement: A column may have a structured data and 
access methods. 

6. Use of identifier for specifying a table name (portal): A table name is 
expressed by an identifier of dot notation to specify the table uniquely in 
the VO. A dot character in the identifier must be escaped by a back slash. 
For an example, nao j : sxds . v0\ . 1 . galaxy) where nao j is an authority 
name, /sxds/vO. 1 is a catalog resource path name and galaxy is a table 
resource name. This is a feature dedicated to a portal data service. 

7. UCD (portal): A UCD (Derriere et al. 2004) can be used in place of a 
column name. This is a feature dedicated to a portal data service. The 
portal service searches data resources which have UCD specified in the SQL 
and translate to the column name using the column meta data collected 
from the data services. 

8. Omission of "From" part (portal): This is a feature dedicated to a portal 
data service. The portal searches the data resources from the registry 
according to the query condition. In this case, UCD must be used for 
describing columns and a query condition. 



3. JVO Skynode Architecture 

We have designed an architecture to implement a data service compatible with 
the proposed query language (Figure 1). The "gateway" in the figure provides 
interfaces to access the skynode (Yasuda et al. 2004) services. The received 
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FITS metadata 
table: tO 



FITS image cutout 

HTM index request pyrometer 
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materialized image data table 
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Figure 2. An example to materialize the virtual image table. 



query is transferee! to the JVO SkyNode DBMS, and the result is returned in 
a form of a ResultSet Java object. The access interface is provided by JDBC 
which is a standard interface to access the DBMS in the Java environment. 
Celestial object catalog is stored in a relational data base (RDB) system, such 
as PostgreSQL, MySQL, Oracl, and so on, and FITS files are managed in the 
unix files system and its meta data is stored in the RDB. HTM index (Kunszt et 
al. 2000) is used to perform a fast data search. Query condition related to the 
search region on the sky is converted to a condition on the HTM index ranges 
at the "Data Service Engine", and then the modified SQL is submitted to the 
DBMS. In the case of image data query, it is necessary to materialize the virtual 
image table as follows: First, parameters specifying the image cut out regions 
are stored in a table t2 (Figure 2) and the corresponding HTM index range table 
t3 is also created. Then FITS meta data table tO is joined with table t2 with 
intervention of the HTM index table tl and t3, then the virtual image table is 
materialized as shown in the bottom of the figure. 
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